Manipulaciones Basicas con dplyr


In [ ]:

The package dplyr offers functions for: • Filtering of observations • Variable selection • Recoding • Grouping • Aggregation (in groups)


In [1]:
data(Cars93, package = "MASS")

In [2]:
library("dplyr")


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


In [3]:
## ---- eval = TRUE, echo = TRUE-------------------------------------------
class (Cars93)

## ---- eval = TRUE, echo = TRUE-------------------------------------------
Cars93 <- tbl_df(Cars93)
class(Cars93)


'data.frame'
  1. 'tbl_df'
  2. 'tbl'
  3. 'data.frame'

In [4]:
slice(Cars93, 1)


ManufacturerModelTypeMin.PricePriceMax.PriceMPG.cityMPG.highwayAirBagsDriveTrain...PassengersLengthWheelbaseWidthTurn.circleRear.seat.roomLuggage.roomWeightOriginMake
Acura Integra Small 12.9 15.9 18.8 25 31 None Front ... 5 177 102 68 37 26.5 11 2705 non-USA Acura Integra

In [5]:
slice (Cars93, c(1,4,10,15, n ()))


ManufacturerModelTypeMin.PricePriceMax.PriceMPG.cityMPG.highwayAirBagsDriveTrain...PassengersLengthWheelbaseWidthTurn.circleRear.seat.roomLuggage.roomWeightOriginMake
Acura Integra Small 12.9 15.9 18.8 25 31 None Front ... 5 177 102 68 37 26.5 11 2705 non-USA Acura Integra
Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & Passenger Front ... 6 193 106 70 37 31.0 17 3405 non-USA Audi 100
Cadillac DeVille Large 33.0 34.7 36.3 16 25 Driver only Front ... 6 206 114 73 43 35.0 18 3620 USA Cadillac DeVille
Chevrolet Lumina Midsize 13.4 15.9 18.4 21 29 None Front ... 6 198 108 71 40 28.5 16 3195 USA Chevrolet Lumina
Volvo 850 Midsize 24.8 26.7 28.5 20 28 Driver & Passenger Front ... 5 184 105 69 38 30.0 15 3245 non-USA Volvo 850

In [6]:
filter(Cars93, Manufacturer == "Audi" & Min.Price > 25)


ManufacturerModelTypeMin.PricePriceMax.PriceMPG.cityMPG.highwayAirBagsDriveTrain...PassengersLengthWheelbaseWidthTurn.circleRear.seat.roomLuggage.roomWeightOriginMake
Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only Front ... 5 180 102 67 37 28 14 3375 non-USA Audi 90
Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & PassengerFront ... 6 193 106 70 37 31 17 3405 non-USA Audi 100

In [7]:
subset(Cars93, Manufacturer == "Audi" & Min.Price > 25)


ManufacturerModelTypeMin.PricePriceMax.PriceMPG.cityMPG.highwayAirBagsDriveTrain...PassengersLengthWheelbaseWidthTurn.circleRear.seat.roomLuggage.roomWeightOriginMake
Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only Front ... 5 180 102 67 37 28 14 3375 non-USA Audi 90
Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & PassengerFront ... 6 193 106 70 37 31 17 3405 non-USA Audi 100

In [8]:
head(arrange(Cars93, desc (MPG.city), Max.Price), 7)


ManufacturerModelTypeMin.PricePriceMax.PriceMPG.cityMPG.highwayAirBagsDriveTrain...PassengersLengthWheelbaseWidthTurn.circleRear.seat.roomLuggage.roomWeightOriginMake
Geo Metro Small 6.7 8.4 10.0 46 50 None Front ... 4 151 93 63 34 27.5 10 1695 non-USA Geo Metro
Honda Civic Small 8.4 12.1 15.8 42 46 Driver only Front ... 4 173 103 67 36 28.0 12 2350 non-USA Honda Civic
Suzuki Swift Small 7.3 8.6 10.0 39 43 None Front ... 4 161 93 63 34 27.5 10 1965 non-USA Suzuki Swift
Subaru Justy Small 7.3 8.4 9.5 33 37 None 4WD ... 4 146 90 60 32 23.5 10 2045 non-USA Subaru Justy
Toyota Tercel Small 7.8 9.8 11.8 32 37 Driver only Front ... 5 162 94 65 36 24.0 11 2055 non-USA Toyota Tercel
Ford Festiva Small 6.9 7.4 7.9 31 33 None Front ... 4 141 90 63 33 26.0 12 1845 USA Ford Festiva
Pontiac LeMans Small 8.2 9.0 9.9 31 41 None Front ... 4 177 99 66 35 25.5 17 2350 USA Pontiac LeMans

In [9]:
head (select (Cars93, Manufacturer, Price), 3)


ManufacturerPrice
Acura15.9
Acura33.9
Audi 29.1

In [10]:
head (select (Cars93, contains ("Price")), 3)


Min.PricePriceMax.Price
12.915.918.8
29.233.938.7
25.929.132.3

In [11]:
head (select (Cars93, myPrize = Price, Min.Price))


myPrizeMin.Price
15.912.9
33.929.2
29.125.9
37.730.8
30.023.7
15.714.2

In [12]:
m <- mutate(Cars93, is_ford = Manufacturer == "Ford")
m[1:3, c(1,28)]


Manufactureris_ford
AcuraFALSE
AcuraFALSE
Audi FALSE

In [13]:
by_type <- group_by (Cars93, Type)
summarize (by_type,
 count = n(),min_es = min(EngineSize),
 max_es = max(EngineSize)
)


Typecountmin_esmax_es
Compact16 2.0 3.0
Large 11 3.3 5.7
Midsize22 2.0 4.6
Small 21 1.0 2.2
Sporty 14 1.3 5.7
Van 9 2.4 4.3

In [14]:
Cars93 %>%
 group_by(Type) %>%
 summarize(count = n(), min_es = min(EngineSize), max_es =
max(EngineSize) )


Typecountmin_esmax_es
Compact16 2.0 3.0
Large 11 3.3 5.7
Midsize22 2.0 4.6
Small 21 1.0 2.2
Sporty 14 1.3 5.7
Van 9 2.4 4.3

In [15]:
Cars93 %>% group_by(Type) %>% slice(1:2)


ManufacturerModelTypeMin.PricePriceMax.PriceMPG.cityMPG.highwayAirBagsDriveTrain...PassengersLengthWheelbaseWidthTurn.circleRear.seat.roomLuggage.roomWeightOriginMake
Audi 90 Compact 25.9 29.1 32.3 20 26 Driver only Front ... 5 180 102 67 37 28.0 14 3375 non-USA Audi 90
Chevrolet Cavalier Compact 8.5 13.4 18.3 25 36 None Front ... 5 182 101 66 38 25.0 13 2490 USA Chevrolet Cavalier
Buick LeSabre Large 19.9 20.8 21.7 19 28 Driver only Front ... 6 200 111 74 42 30.5 17 3470 USA Buick LeSabre
Buick Roadmaster Large 22.6 23.7 24.9 16 25 Driver only Rear ... 6 216 116 78 45 30.5 21 4105 USA Buick Roadmaster
Acura Legend Midsize 29.2 33.9 38.7 18 25 Driver & Passenger Front ... 5 195 115 71 38 30.0 15 3560 non-USA Acura Legend
Audi 100 Midsize 30.8 37.7 44.6 19 26 Driver & Passenger Front ... 6 193 106 70 37 31.0 17 3405 non-USA Audi 100
Acura Integra Small 12.9 15.9 18.8 25 31 None Front ... 5 177 102 68 37 26.5 11 2705 non-USA Acura Integra
Dodge Colt Small 7.9 9.2 10.6 29 33 None Front ... 5 174 98 66 32 26.5 11 2270 USA Dodge Colt
Chevrolet Camaro Sporty 13.4 15.1 16.8 19 28 Driver & Passenger Rear ... 4 193 101 74 43 25.0 13 3240 USA Chevrolet Camaro
Chevrolet Corvette Sporty 34.6 38.0 41.5 17 25 Driver only Rear ... 2 179 96 74 43 NA NA 3380 USA Chevrolet Corvette
Chevrolet Lumina_APV Van 14.7 16.3 18.0 18 23 None Front ... 7 178 110 74 44 30.5 NA 3715 USA Chevrolet Lumina_APV
Chevrolet Astro Van 14.7 16.6 18.6 15 20 None 4WD ... 8 194 111 78 42 33.5 NA 4025 USA Chevrolet Astro

In [16]:
Cars93 %>% mutate(ES2 = EngineSize^2) %>% group_by(Type) %>%
summarize(min.ES2 = min(ES2)) %>% arrange(desc(min.ES2))


Typemin.ES2
Large 10.89
Van 5.76
Compact 4.00
Midsize 4.00
Sporty 1.69
Small 1.00

wINDOWS


In [17]:
Cars93 %>%
 group_by(Type) %>%
 arrange(Type) %>%
 select(Manufacturer:Price) %>%
 mutate(cmean = cummean(Price), csum = cumsum(Price))


ManufacturerModelTypeMin.PricePricecmeancsum
Audi 90 Compact 25.9 29.1 29.10000 29.1
Chevrolet Cavalier Compact 8.5 13.4 21.25000 42.5
Chevrolet Corsica Compact 11.4 11.4 17.96667 53.9
Chrysler LeBaron Compact 14.5 15.8 17.42500 69.7
Dodge Spirit Compact 11.9 13.3 16.60000 83.0
Ford Tempo Compact 10.4 11.3 15.71667 94.3
Honda Accord Compact 13.8 17.5 15.97143 111.8
Mazda 626 Compact 14.3 16.5 16.03750 128.3
Mercedes-Benz 190E Compact 29.0 31.9 17.80000 160.2
Nissan Altima Compact 13.0 15.7 17.59000 175.9
Oldsmobile Achieva Compact 13.0 13.5 17.21818 189.4
Pontiac Sunbird Compact 9.4 11.1 16.70833 200.5
Saab 900 Compact 20.3 28.7 17.63077 229.2
Subaru Legacy Compact 16.3 19.5 17.76429 248.7
Volkswagen Passat Compact 17.6 20.0 17.91333 268.7
Volvo 240 Compact 21.8 22.7 18.21250 291.4
Buick LeSabre Large 19.9 20.8 20.80000 20.8
Buick Roadmaster Large 22.6 23.7 22.25000 44.5
Cadillac DeVille Large 33.0 34.7 26.40000 79.2
Chevrolet Caprice Large 18.0 18.8 24.50000 98.0
Chrylser Concorde Large 18.4 18.4 23.28000 116.4
Chrysler Imperial Large 29.5 29.5 24.31667 145.9
Eagle Vision Large 17.5 19.3 23.60000 165.2
Ford Crown_VictoriaLarge 20.1 20.9 23.26250 186.1
Lincoln Town_Car Large 34.4 36.1 24.68889 222.2
Oldsmobile Eighty-Eight Large 19.5 20.7 24.29000 242.9
Pontiac Bonneville Large 19.4 24.4 24.30000 267.3
Acura Legend Midsize 29.2 33.9 33.90000 33.9
Audi 100 Midsize 30.8 37.7 35.80000 71.6
BMW 535i Midsize 23.7 30.0 33.86667 101.6
.....................
Pontiac LeMans Small 8.2 9.0 10.37333 155.6
Saturn SL Small 9.2 11.1 10.41875 166.7
Subaru Justy Small 7.3 8.4 10.30000 175.1
Subaru Loyale Small 10.5 10.9 10.33333 186.0
Suzuki Swift Small 7.3 8.6 10.24211 194.6
Toyota Tercel Small 7.8 9.8 10.22000 204.4
VolkswagenFox Small 8.7 9.1 10.16667 213.5
Chevrolet Camaro Sporty 13.4 15.1 15.10000 15.1
Chevrolet Corvette Sporty 34.6 38.0 26.55000 53.1
Dodge Stealth Sporty 18.5 25.8 26.30000 78.9
Ford Mustang Sporty 10.8 15.9 23.70000 94.8
Ford Probe Sporty 12.8 14.0 21.76000 108.8
Geo Storm Sporty 11.5 12.5 20.21667 121.3
Honda Prelude Sporty 17.0 19.8 20.15714 141.1
Hyundai Scoupe Sporty 9.1 10.0 18.88750 151.1
Mazda RX-7 Sporty 32.5 32.5 20.40000 183.6
Mercury Capri Sporty 13.3 14.1 19.77000 197.7
Plymouth Laser Sporty 11.4 14.4 19.28182 212.1
Pontiac Firebird Sporty 14.0 17.7 19.15000 229.8
Toyota Celica Sporty 14.2 18.4 19.09231 248.2
VolkswagenCorrado Sporty 22.9 23.3 19.39286 271.5
Chevrolet Lumina_APVVan 14.7 16.3 16.30000 16.3
Chevrolet Astro Van 14.7 16.6 16.45000 32.9
Dodge Caravan Van 13.6 19.0 17.30000 51.9
Ford Aerostar Van 14.5 19.9 17.95000 71.8
Mazda MPV Van 16.6 19.1 18.18000 90.9
Nissan Quest Van 16.7 19.1 18.33333 110.0
OldsmobileSilhouetteVan 19.5 19.5 18.50000 129.5
Toyota Previa Van 18.9 22.7 19.02500 152.2
VolkswagenEurovan Van 16.6 19.7 19.10000 171.9

In [20]:
install.packages('data.table')


Installing package into 'C:/Users/admin/Documents/R/win-library/3.3'
(as 'lib' is unspecified)
also installing the dependency 'chron'

package 'chron' successfully unpacked and MD5 sums checked
package 'data.table' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\admin\AppData\Local\Temp\RtmpO0Tqzw\downloaded_packages

In [21]:
require(data.table)
Cars93 <- data.table(Cars93)


Loading required package: data.table
------------------------------------------------------------------------------
data.table + dplyr code now lives in dtplyr.
Please library(dtplyr)!
------------------------------------------------------------------------------

Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

    between, last


In [23]:
install.packages('microbenchmark')


Installing package into 'C:/Users/admin/Documents/R/win-library/3.3'
(as 'lib' is unspecified)
also installing the dependencies 'colorspace', 'RColorBrewer', 'dichromat', 'munsell', 'labeling', 'gtable', 'plyr', 'reshape2', 'scales', 'ggplot2'

package 'colorspace' successfully unpacked and MD5 sums checked
package 'RColorBrewer' successfully unpacked and MD5 sums checked
package 'dichromat' successfully unpacked and MD5 sums checked
package 'munsell' successfully unpacked and MD5 sums checked
package 'labeling' successfully unpacked and MD5 sums checked
package 'gtable' successfully unpacked and MD5 sums checked
package 'plyr' successfully unpacked and MD5 sums checked
package 'reshape2' successfully unpacked and MD5 sums checked
package 'scales' successfully unpacked and MD5 sums checked
package 'ggplot2' successfully unpacked and MD5 sums checked
package 'microbenchmark' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\admin\AppData\Local\Temp\RtmpO0Tqzw\downloaded_packages

In [24]:
require(microbenchmark)
N <- 1000000
dat<- data.table(
  x=sample(LETTERS[1:20], N, replace=TRUE),
  y=sample(letters[1:5], N, replace=TRUE))
head(dat, 3)

setkey(dat, x,y)

microbenchmark(
  data.table = dat[list(c("B", "D"), c("b", "d"))],
  dplyr = dat %>% slice(x %in% c("B", "D") & y %in% c("b", "d")),
  baseR = dat[x %in% c("B", "D") & y %in% c("b", "d")]
)


Loading required package: microbenchmark
xy
Ka
Qd
Gc
exprtime
data.table 1950543
data.table 1006286
baseR 55832576
data.table 1303486
baseR 58823766
baseR 54660286
baseR 58205715
baseR 53069864
data.table 1062067
baseR 57616669
dplyr 119652071
dplyr 105449384
data.table 1177644
baseR 53854365
data.table 1207543
baseR 102857584
data.table 1069206
data.table 936671
dplyr 59487781
baseR 57939306
data.table 1091519
dplyr 61305343
data.table 1316428
dplyr 62308058
baseR 110325966
data.table 1057158
dplyr 108256720
data.table 1256631
dplyr 55326532
data.table 1355251
......
data.table 1024136
data.table 948720
baseR 53621424
data.table 1086610
data.table 1006286
baseR 57465838
baseR 51934167
baseR 100115846
dplyr 57611761
data.table 1193709
baseR 53329133
dplyr 56833953
dplyr 56365394
baseR 58661332
baseR 53363940
baseR 57207907
baseR 54375134
baseR 58050421
data.table 1087503
baseR 59920194
baseR 53611161
baseR 58632773
baseR 54360409
baseR 58321293
dplyr 104288250
data.table 1274034
dplyr 55624179
dplyr 60385183
dplyr 63381728
baseR 57344905

In [ ]:


In [ ]:


In [ ]: